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Abstract 

In many practical decision-making problems it happens that func¬ 
tions involved in optimization process are black-box with unknown an¬ 
alytical representations and hard to evaluate. In this paper, a global 
optimization problem is considered where both the goal function f{x) 
and its gradient f'{x) are black-box functions. It is supposed that f'{x) 
satisfies the Lipschitz condition over the search hyperinterval with an 
unknown Lipschitz constant K. A new deterministic ‘Divide-the-Best’ 
algorithm based on efficient diagonal partitions and smooth auxiliary 
functions is proposed in its basic version, its convergence conditions 
are studied and numerical experiments executed on eight hundred test 
functions are presented. 

Key Words. Global optimization, deterministic methods, Lipschitz gradients, 
unknown Lipschitz constant. 


1 Introduction 

In many important applied problems, some decisions should be made by find¬ 
ing the global optimum of a multiextremal objective function subject to a set 
of constrains (see, e. g., lainiligEIllMlIMlIMlIiaiMllbOlEIllSl] and the 
references given therein). Frequently, especially in engineering applications, 
the functions involved in optimization process are black-box with unknown 
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analytical representations and hard to evaluate. Such computationally chal¬ 
lenging decision-making problems often cannot be solved by traditional op¬ 
timization techniques based on strong suppositions (such as, e. g., convexity) 
about the problem. 

Because of significant computational costs involved, normally a limited 
number of functions evaluations are available for a decision-maker (physicist, 
biologist, economist, engineer, etc.) when he/she optimizes this kind of 
functions. Hence, the main goal is to construct fast global optimization 
algorithms that generate acceptable solutions with a relatively small number 
of functions evaluations. 

Usually, the methods applied in this context are subdivided in stochastic 
and deterministic. Stochastic approaches (see, e. g., m El ESI EH |32l [531151]) 
can often work in a simpler manner than the deterministic algorithms. They 
can be also suitable for the problems where the functions evaluations are 
corrupted by noise. However, solutions found by some stochastic algorithms 
(e. g., by popular heuristic methods like evolutionary algorithms, simulated 
annealing, etc.; see, e. g., IS EIIESI EH |52]) can be only local solutions that 
can be located far away from the global one. This fact can preclude these 
methods from their usage in practice if an accurate estimate of the global 
solution is required. Therefore, our attention in the paper is focused on 
deterministic approaches. 

Deterministic global optimization is an important applied field (see, e. g., 
laiiHi EsmaEH 155] 1. As a rule, deterministic methods exhibit (under 
certain conditions) rigorous global convergence properties and allow one to 
obtain guaranteed estimations of global solutions. However, deterministic 
models can still require too many function evaluations to achieve adequately 
good solutions for the considered global optimization problems. 

One of the simplest techniques in this framework is represented by the 
so-called derivative-free (or direct) approach (see, e.g., jH El ESI ESI E7]l. 
often used for solving important applied problems (see, e. g., pattern search 
methods |22j . the DIRECT method |20j . the response surface, or surrogate 
model methods |21j . etc.). Unfortunately (see, e. g., i6i Ea Eaim), these 
methods either are designed to detect only stationary points or can demand 
too high computational effort for their work. As observed, e. g., in |181149] . 
if no particular assumptions are made on the objective function, any finite 
number of function evaluations cannot guarantee getting close to the global 
minimum, since the function may have very deep and narrow peaks. 

The Lipschitz-continuity assumption is one of the natural suppositions 
on the objective function (in fact, in technical systems the energy of change 
is always limited). The problem involving Lipschitz functions is said to be 
the Lipschitz global optimization problem (see, e. g., O EH E3 IMlIiailel 

EIlEll). 

The global optimization problem with a differentiable objective function 
having the Lipschitz gradient (with an unknown Lipschitz constant) is an 
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important class of Lipschitz global optimization problems. Formally, this 
class of problems can be stated as follows: 

f* = f{x*) = min f{x), (1) 

x£D 

\\f'{x') — f'{x'')\\<K\\x' — x''\\, x',x"^D, 0 < K < oo, (2) 

where 

D = [a, b] = {xeR^ : a{j) < x{j) < b{j)}. (3) 

It is assumed here that the objective function f{x) can be black-box, 
multiextremal, its gradient f'{x) = • • •, (which can 

be itself an expensive multiextremal black-box vector-function) can be cal¬ 
culated during the search, and f'{x) is Lipschitz-continuous with some un¬ 
known constant K, 0 < K < oo, over D. Problem (II])-(l3|) is frequently met 
in engineering (see, e. g., iMi aa El]), for instance, in electrical engineering 
design (see, e. g., mm)- 

There are known several methods for solving this problem that can be 
distinguished with respect to the way the Lipschitz constant K from (|3|) is 
estimated in their computational schemes. There exist algorithms using an a 
priori given estimate of K (see, e. g., dJEa)) adaptive estimates (see, e. g., 
|131 im l39]L and adaptive estimates of local Lipschitz constants (see, e. g., 
[Mills]). Recently, methods working with multiple estimates of K chosen 
from a set of possible values have been also proposed (see mm)- 

This paper is devoted to developing a new global optimization algorithm 
for solving problem (II])-(l3|. The new method adaptively estimates the un¬ 
known Lipschitz constant K from ([2|) during the search and is based on 
efficient diagonal partitions of the search domain proposed recently. Theo¬ 
retical background for the introduced method is drawn in Section 2. The 
algorithm is presented and analyzed theoretically in Section 3. Section 4 
presents results of numerical experiments executed on several hundreds of 
multiextremal test functions. Finally, Section 5 concludes the paper. 

2 Theoretical background 

In this Section, some important theoretical results, necessary for introducing 
the new algorithm, are briefly described. 

2.1 ‘Divide-the-Best’ algorithms 

As known, many global optimization methods (of both stochastic and deter¬ 
ministic types) have a similar structure. Therefore, several approaches to the 
development of a general framework for describing global optimization algo¬ 
rithms and providing their convergence conditions in a unified manner have 
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been proposed (see, e. g., laillKIHllae]). The ^Divide-the-Best’ approach (see 
@01115]) is one of such unifying schemes. It generalizes both the schemes of 
adaptive partition |36] and characteristic nsmsiisi] algorithms, which are 
widely used for constructing and studying global optimization methods. 



Figure 1: Flow chart of ‘Divide-the-Best’ algorithms. 


In a ‘Divide-the-Best’ algorithm (its generic iteration is represented by 
the flow chart in Figured]), given a vector q of the method parameters, an 
adaptive partition of the search domain D from (j3]) into a finite set {D^} 
of robust subsets is considered at every iteration k. In Step 1, the Lips- 
chitz constant {K from ([2]), for the objective function gradient f'{x)] or L in 
the case of the Lipschitz objective function f{x)) is estimated in some way. 
Basing on the previously obtained information about the objective 

function, the ‘merit’ (called characteristic) Ri of each subset (see Step 2 in 
Figured]) is estimated for performing a further, more detailed, investigation 
(see Steps 3 and 4 in Figured])- The best (in a predefined sense) character- 
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istic achieved over a hyperinterval corresponds to a higher possibility to 
determine the global minimum point within (see Step 3). The hyperin¬ 
terval is then partitioned at the next iteration of the algorithm. More 
than one ‘promising’ hyperintervals can be subdivided at each iteration. 

Various techniques (for instance, in the context of the geometric ap¬ 
proach, see, e. g., HU EH 061 EU) for selection of hyperintervals to be parti¬ 
tioned (see Step 3 in Figure [T]) and for executing this partitioning (by means 
of an operator P, see Step 4 in Figured] and the following subsection 2.2) 
can be used within this scheme. 

As the stopping criteria, one can check, e. g., the volume of a hyperinter¬ 
val with the best characteristic or depletion of computational resources such 
as the maximal number of trials (i. e., evaluations of the objective function 
and its gradient). 

Theoretical analysis of the ‘Divide-the-Best’ algorithms with different 
types of characteristics and partition operators is performed in llOlllS]. A 
particular attention is given there to situations (important in applied prob¬ 
lems) when conditions of global (local) convergence are satisfied not in the 
whole domain D, but only in its small subregion(s). This can happen, e. g., 
in many Lipschitz global optimization algorithms that either underestimate 
the Lipschitz constant during their work or use local information in various 
subregions of D (see, e. g., |23l dO] 051 [51]). It should be also noticed that 
the ‘Divide-the-Best’ scheme can be successfully applied to develop parallel 
multidimensional global optimization methods (see, e. g., |161151) 1. 

2.2 Efficient partition strategy 

Regarding the partition strategies (see partitioning operator P on Step 4 in 
Figured])) the diagonal partition strategies (see the references in |3611441 0^ 
06]) are taken into account in this paper, although other types of subdivi¬ 
sions are worth to be considered (see, e. g., simplicial partitions in |3l l331l34] L 
In the diagonal approach, the hyperinterval D from (|3]) is subdivided into 
a set of smaller hyperintervals, the objective function (and its gradient) is 
(are) evaluated only at two vertices corresponding to the main diagonal of 
the generated hyperintervals (see, e. g., vertices a* and of a hyperinter¬ 
val Di in Figure |2]), and the results of these trials are used to choose a 
hyperinterval for subsequent subdivisions. This approach has a number of 
interesting theoretical properties and has proved to be efficient in solving 
practical problems. 

First, efficient one-dimensional global optimization algorithms can be eas¬ 
ily extended to the multidimensional case by means of the diagonal approach 
(see, e. g., |4411451 06] 1. For instance, in order to obtain the characteristic Ri 
of a multidimensional hyperinterval Dj, some univariate characteristics can 
be used as prototypes. After an appropriate transformation they can be con¬ 
sidered along the one-dimensional segment being the main diagonal [oj, 6i] 
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Figure 2: Obtaining the lower bound Ri for the objective function f{x) 
with the Lipschitz gradient over Di = [ai,bj\ by using smooth auxiliary 
function A E [a*, 6*], along the main diagonal [aj,6j] of Di. 


of the hyperinterval Di (see smooth auxiliary function </>j(A) in Figure |2]). 

Second, the diagonal approach is computationally close to one of the 
simplest strategies—center-sampling technique (see, e. g., 0 El El uni Eo])— 
but at the same time, the function trials are executed at two points of each 
hyperinterval, providing so more information about the function over the 
subregion under investigation than center-sampling algorithms that evalu¬ 
ate f{x) (and f'{x)) only at one central point. 

An important issue in the framework of diagonal algorithms is the way 
the hyperintervals are partitioned during the search. Various exploration 
techniques based on different diagonal partition strategies are studied, e. g., 
in gn ia sg. It is shown that partition strategies traditionally used in 
diagonal methods can perform many redundant trials and, hence, do not fulfil 
the requirements of computational efficiency (this redundancy significantly 
slows down the search in the case of expensive functions). 

A diagonal partition strategy introduced in iniia allows one to avoid 
the computational redundancy of diagonal methods. In contrast to tradi¬ 
tional diagonal schemes, the proposed strategy (called non-redundant, or 
efficient diagonal partition strategy) produces regular trial meshes in such a 
way that one vertex where /(x) and f'{x) are evaluated can belong to several 
hyperintervals (up to 2^, N is the problem dimension from ([3]); see Figure El 
that shows how partitions of gray-colored hyperintervals are executed in the 
course of iterations 1-6, where trial points are represented by black dots). So, 
the time-consuming operation of evaluating both f[x) and f'{x) is replaced 
by a significantly faster procedure of reading (up to 2^ times) the previously 
obtained functions values, saved in a special database (see, e. g., gS]). In 
Figure El during the 6-th iteration the partition has been performed for free 


6 
















Figure 3: Example of partitioning a three-dimensional hyperinterval by the 
efficient diagonal partition strategy. 


since the required values of f{x) and f'{x) have been already obtained at 
the 4-th and 5-th iterations (trial points of the 6-th iteration are in brackets, 
see Figure [3jl). Hence, the non-redundant partition strategy speeds up the 
search and also leads to saving computer memory. It is important that this 
feature becomes more pronounced when the problem dimension N increases 
(see, e. g., HH El 05]). A formal procedure of subdivisions by means of the 
efficient diagonal partition strategy can be viewed in the next Section (see 
formulae (fT^ - (IT8]) l. 

A novel scheme for developing fast Lipschitz global optimization methods 
can be, thus, considered. It can rely on the non-redundant diagonal parti¬ 
tion strategy allowing an efficient extension of well-studied one-dimensional 
Lipschitz global optimization algorithms to the multidimensional case. In¬ 
teresting multidimensional diagonal methods, based on different ways for 
achieving the Lipschitz information and constructed in the framework of the 
efficient diagonal approach, have been proposed and their convergence prop¬ 
erties have been studied, e. g., in 1211 El SS]. In the following Section, this 
scheme is used (see |45) 1 to develop a new global optimization algorithm 
for solving problem ([Il)-(|3l). This algorithm uses smooth diagonal auxiliary 
functions along the main diagonals of hyperintervals and shows how the un- 
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known constant K from ([2]) can be adaptively estimated during the search. 


3 New algorithm SmoothD 

In this Section, a new algorithm SmoothD for solving problem ([I])-® is 
described. It generalizes the one-dimensional method adaptively estimating 
the constant K (see |39) 1 to the multidimensional case by using the diag¬ 
onal approach and the efficient partition strategy from mms]. First, the 
new method is presented and its computational scheme is given, then its 
convergence properties are analyzed. 

To describe the algorithm SmoothD formally, we need the following 
designations: 

k > 1 ~ the current iteration number of the algorithm; 
p{k) - the total number of function trials executed during the previous 
k — 1 iterations of the method; 

sequence of trial points generated by the algorithm in k 

iterations; 

= {f{xP^^^)} - the corresponding sequence of the objective func¬ 
tion values; 

{1 ^ j ^ sequences of the corresponding 

partial derivatives /j(x), 1 < J < iV, evaluated at 

M = M (fc) - the total number of hyperintervals of the current partition 
of the search domain at the beginning of the fc-th iteration; 

{D^} = {D^} = {Di}, 1 < i < M, - the current partition of the initial 
hyperinterval D into hyperintervals Di = [ai,bi]. 

For each hyperinterval Di the coordinates of the vertices a* and bi of its 
main diagonal [ai,bi], the corresponding values of /(oj) and /(^i), and the 
corresponding values of partial derivatives /j(aj) and fj{bi), 1 < j < N, are 
stored. 

Directional derivatives along main diagonals are then evaluated as fol¬ 
lows: 


where 


f'iai) 




fih) 




Ai = 


iQ'i 


N 


'i 




(4) 

(5) 

( 6 ) 
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is the length of the main diagonal of hyperinterval Di, 1 < i < M{k). 

New Algorithm SmoothD. Set the initial valne of the iteration connter 
k := 1. Evalnate the objective fnnction f{x) and its partial derivatives fj{x), 
1 < J < A', at the vertices a and b of the search domain D: x^ = a, x^ = b. 
Set both the cnrrent nnmber of trials p{l) := 2 and the cnrrent nnmber 
of hyperintervals M(l) := 1. The initial partition of the search domain 
is {D^} := {D}. 

Snppose that A; > 1 iteration of the algorithm have been already execnted. 
Iteration A: + 1 consists of the following Steps. 


Step 1 {Estimation of the Lipschitz constant for the gradient). Obtain 
the cnrrent estimate m of the Lipschitz constant K from (|2|) for the 
objective fnnction gradient as 

m = rmax{f^,m}, (7) 


where r > 1 is the reliability parameter of the method, ^ is a small 
positive valne (it ensnres the correct algorithm execntion when the 
valne m is too small; as a rnle, ^ < 10“®), and m is fonnd as 


m = max |rcj|, 
l<i<M{k) 

with 

|2(/(ai) - fih)) + ifiai) + fibi))A,\ + d, 

- =-- 

and 


di — {[2(/(oj) — / (6j)) + {f'{ai) + f'{bi))Ai]‘^ + 

+ {nbi)-f{ai))^A^}l 

Step 2 {Characteristics calculations). For each hyperinterval Di G 

1 < i < M{k), calcnlate its characteristic Ri, being the minimnm of 
the anxiliary fnnction (fi{X) on the diagonal [ai,bi] (see Fig. [2|), as 
follows. 

Step 2.1. If ineqnality 

KiyiWiivi) < o> 

is verified, where 7r{{yi), 7r((?/() are fonnd by formnlae 

T^iidi) = myi + Bi, 7r'(y') = my{ + Bi, 

with 

_ A, , f{h) - f'{ai) , 

4 4m 

f{ai) - f{bi) + f'{bi)Aj + 0.5mA^^ 
mAi + f'{bi) - f'{ai) 


9 





A. f{k)-fiai) ^ 

4 4m 

fjai) - fjbj) + f'jbi)Aj + 0.5mA‘^ 
mAi + f'{bi) - f'{ai) 


Bi = f'{bi) - 2myi + mAi, 
then go to Step 2.2. Otherwise, go to Step 2.3. 
Step 2.2. Calculate characteristic Ri as 


Ri = vain{f{ai),(t)i{xi),f{bi)], (8) 


where 


4>iixi) = f{bi) - f{bi)Ai - OAmAj + rnyf - O.hmfj^ 


and 

Xi = 2yi - m~^f{bi) - A^. 

Go to Step 3. 

Step 2.3. Calculate characteristic Ri as 


Ri = min{/(oi), f{bi)]. (9) 

Step 3 {Hyperinterval selection). Select a hyperinterval Dt G for the 

further partitioning such that condition 

t = arg min Ri (10) 

l<i<M{k) 

is verihed. If there are several hyperintervals satisfying condition (llOp . 
then choose among them the hyperinterval with the minimal index t. 

Step 4 {Stopping criterion). If 


\\at - bt\\ < e||a - 6||, (11) 

where e > 0 is a given accuracy, then stop the algorithm. 

Take as an estimate of the global minimum f* value 

zl = m.m{z I z G 


attained at point 


Xfc = arg min /(x*). 


Otherwise, go to Step 5. 
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Step 5 {Generation of points u and v). Calculate coordiantes of points u 
and V for partitioning the hyperinterval Dt = [a^,6^]: 

u = (a(l), ... ,a{i - l),a{i) + 

+ ^(^(0 - a{i)), a{i + 1),..., a{N)), (12) 

V = (6(1), ...,b{i- l),b{i)+ 

+ ‘^{a{i) - b{i)), b{i + 1),..., b{N)), (13) 

where a{j) = a^{j), b{j) = bf{j), 1 < j < N, and i is found as 

i = min arg max {\b{j) — a(i)|}. (14) 

l<j<N 

Check whether the function f{x) and its partial derivatives fj{x), 1 < 
j < N, have been already evaluated at the points u and v. 

Step 5.1. If f{x) and fj{x), 1 < J < fV, have been previously evalu¬ 
ated at both the points u and n, then set p{k + 1) := p{k) and go to 
Step 6. 

Step 5.2. If f{x) and fj{x), 1 < J < fV, have been previously evalu¬ 
ated at only one point, e. g., at point u (similar operation is performed 
in the case when the functions evaluations have been performed only 
at point v), then perform a new trial of f{x) and /j(x), 1 < j < iV, at 
point 

^P{k)+i ^ ^ ^ 

set p{k + 1) := p{k) -|- 1 and go to Step 6. 

Step 5.3. If neither f{x) nor fj{x), 1 < j < fV, have been previously 
evaluated at points u, v, then evaluate f{x) and /j(x), 1 < j < iV, at 
points 

^p{k)+l ^ ^p{k)+2 ^ 

set p{k + 1) := p{k) -|- 2 and go to Step 6. 

Step 6 {Efficient diagonal partition). Obtain a new partition of 

the search domain D: 

{D-'+'l = {I)‘) \ {D,} U ^.) U {0“ (15) 

where vertexes of the main diagonals of new hyperintervals 
■^Af(fc)+i ■^Af(fc)+2 Siven by formulae 

= «'M{k)+2 = u, b^~^^ = bM(k)+l = X. (16) 
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«M(fc)+l — ) bM{k)+l — V, (17) 

0‘M(k)+2 = U, bM{k)+2 = bt, (18) 

respectively. 

Set M(k + 1) := M{k) + 2, increase the iteration connter k := k + 1 
and go to Step 1. 

The introdnced algorithm SmoothD belongs to the class of diagonally 
extended geometric algorithms and, thns, to a more general class of ‘Divide- 
the-Best’ algorithms (see m)- Its reliability parameter r from ([7| controls 
the estimate m of the Lipschitz constant K of f'{x) and affects the algo¬ 
rithm’s convergence, as it will be shown in the next Section. Namely, by 
increasing r the reliability of the method also increases. As this parameter 
decreases, the search rate increases, bnt the probability of convergence to a 
point other than the global minimizer of f{x) grows as well. 

The next Theorem establishes convergence properties of the proposed 
algorithm. 

Theorem 1 For any function f{x) with the gradient satisfying the Lipschitz 
condition m with the constant K, 0 < K < oo, there exists a value r* of the 
reliability parameter r of the SmoothD algorithm such that for any r > r* 
the infinite (e = 0 in HID ) sequence of trial points {x^)^)}, generated by the 
SmoothD algorithm during minimization of f(x), will converge only to the 
global minimizers of f{x). 

Proof. This resnlt can be obtained as a particnlar case of the general 
convergence stndy of ‘Divide-the-Best’ algorithms from m and its proof is 
so omitted. □ 

It shonld be noticed in this context that if a valne of r smaller than r* 
is nsed, the algorithm can converge (see the general analysis execnted for 
‘Divide-the-Best’ methods in uni ) to a local minimizer of /(x) or to the 
bonndary of a snbregion of D corresponding to the best characteristics ([8|- 
m- This sitnation indicates the necessity to increase the valne of the reli¬ 
ability parameter, i.e., it is a practical hint for the choice of r. 

4 Numerical experiments 

In this Section, resnlts of nnmerical experiments aimed at evalnating the 
SmoothD method performance and comparing it with some known global 
optimization techniqnes are given. Particnlarly, two popnlar global optimiza¬ 
tion algorithms (downloadable from http: //www4.ncsu. edu/~ctk/S0FTWARE/DIRECTv204. tar .gz) 
have been taken for nnmerical comparisons: the DIRECT method from |20) 
and its locally-biased version DIRECT/ from jlO] . Both of them partition 
the search region D into small hyperintervals and are widely nsed for solving 
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applied black-box global optimization problems, being therefore important 
competitors for the SmoothD method. They also belong to the class of 
‘Divide-the-Best’ algorithms but do not use the information about the ob¬ 
jective function gradient during their work. 

All experiments were performed on a 3.40 GHz Intel(R) Core(TM) i7- 
2600 CPU PC with 12 Gb memory using 64-bit Windows system. 

This Section is structured as follows. First, methodology of executing 
numerical experimentation is described presenting test functions and com¬ 
parison criteria used. Then, numerical issues on the reliability parameter 
of the new method are discussed and sensitivity analysis of this parameter 
is performed. Finally, overall numerical results are summarized and com¬ 
mented on. 

4.1 Classes of test functions and comparison criteria 

Eight classes of differentiable test problems generated by the GKLS-generator 
from |12) were used to perform numerical experiments. This generator con¬ 
structs three types (non-differentiable, continuously differentiable, and twice 
continuously differentiable) of classes of multidimensional and multiextremal 
test functions with known local and global minima. The generation proce¬ 
dure consists of defining a convex quadratic function systematically distorted 
by polynomials. 

Each test class provided by the generator consists of 100 functions and 
is defined by the following parameters: (i) problem dimension N, (ii) num¬ 
ber of local minima m, (hi) global minimum value /*, (iv) radius of the 
attraction region of the global minimizer p, (v) distance from the global 
minimizer to the quadratic function vertex d. The other necessary pa¬ 
rameters are chosen randomly by the generator for each test function of 
the class. A special notebook with a complete description of all functions 
is supplied to the user. The GKLS-generator always produces the same 
test classes for a given set of the user-defined parameters, allowing one to 
perform repeatable numerical experiments. By changing the user-defined 
parameters, classes with different properties can be created. The gener¬ 
ator is available on the AGM Gollected Algorithms (GALGO) database 
(the GALGO is part of a family of publications produced by the Associ¬ 
ation for Gomputing Machinery) and it is also downloadable for free from 
http://wwwinfo.dimes.unical.it/~yaro/GKLS.html 

In order to obtain comparable results, the same eight GKLS test classes 
of continuously differentiable functions of dimensions N = 2,3,4, and 5 de¬ 
fined by the same five parameters as in |44) were used (see Table [T]) . The 
global minimum value f* was equal to —1.0 and the number of local min¬ 
ima m was equal to 10 for all classes (these values are default settings of 
the GKLS-generator). Two test classes were considered for each particular 
dimension N: the ‘simple’ class and the ‘hard’ one. The difficulty of a class 
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Table 1: Description of the GKLS test classes used in numerical experiments 


Class 

Difficulty 

e 

N 

m 

r 

d 

P 

1. 

Simple 

10"^ 

2 

10 

-1.0 

0.90 

0.20 

2. 

Hard 

10-“^ 

2 

10 

-1.0 

0.90 

0.10 

3. 

Simple 

10"® 

3 

10 

-1.0 

0.66 

0.20 

4. 

Hard 

10-® 

3 

10 

-1.0 

0.90 

0.20 

5. 

Simple 

10"® 

4 

10 

-1.0 

0.66 

0.20 

6. 

Hard 

10"® 

4 

10 

-1.0 

0.90 

0.20 

7. 

Simple 

10-7 

5 

10 

-1.0 

0.66 

0.30 

8. 

Hard 

10-7 

5 

10 

-1.0 

0.66 

0.20 


was increased either by decreasing the radius p of the attraction region of 
the global minimizer x* (as for two- and hve-dimensional classes) or by in¬ 
creasing the distance d from x* to the quadratic function vertex (three- and 
four-dimensional classes). 

In all numerical experiments, the maximal allowed number of function 
trials was set equal to Pmax = 1000 000. A problem was considered to 
be solved by a method under examination if the method generated a trial 
point x' in an e-neighborhood of the global minimizer x* G D = [a,b], i.e., 

< ^{a{j) - b{j)), l<j<N, (19) 

where 0 < e < 1 is an accuracy coefficient (its values are given in the third 
column of Table [T]) . Such a type of stopping criterion is acceptable only when 
the global minimizer x* is known, i.e., in the case of test functions. When 
a real black-box objective function is minimized and global minimization 
methods have an internal stopping criterion (as that of the proposed algo¬ 
rithm, see formula (ffH)), they execute a number of iterations (that can be 
high) after a ‘good’ estimate of f* has been achieved in order to demonstrate 
the ‘goodness’ of the solution found (see, e. g., |T8l l36l 1^ 1. 

Since each evaluation of a real-life black-box objective function is usually 
a time-consuming operation (see, e. g., naiMiiMiiiaiiHiEiiisi]), the maxi¬ 
mal number of function trials executed by the methods until the satisfaction 
of the stopping criteria (|19p was chosen as the main criterion of the com¬ 
parison (other possible comparison indicators as the number of generated 
hyper intervals, the average number of trials, and so on can be also used, see, 

e.g., [SI mi El SHI). 

4.2 The reliability parameter and its sensitivity analysis 

In order to evaluate the influence of the reliability parameter r from ([7| on 
the convergence properties of the SmoothD method, the two-dimensional 
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Table 2: Influence of the reliability parameter on the SmoothD method 
performance in the case of the two-dimensional GKLS test class 1 


Indicator 

f = 1.2 

f = 1.8 

f = 2.8 

f = 3.8 

r = 4.8 

f = 5.8 

p*{f) 

199 

272 

332 

410 

424 

451 

Pavg(r) 

105.14 

169.63 

222.16 

293.52 

323.42 

341.60 

5(f) 

51 

81 

91 

98 

99 

100 


class 1 from Tabled] was taken. A problem from this class was considered to 
be solved by the new method if it generated a trial satisfying condition (1191) 
(if the method stopped due to its internal stopping criterion (II ip and con¬ 
dition (dHl) was not verified, the problem was considered to be unsolved by 
SmoothD with a given value of the reliability parameter). 

The following numbers were taken as indicators of the SmoothD method 
performance in this analysis subject to a fixed value f of the reliability pa¬ 
rameter r, equal to all 100 functions of the class: 

5(f) - number of solved problems among 100 of the test class; 
p*{r) - maximal number of trials required for the method to satisfy con¬ 
dition (|19l) for all 5(f) functions of the test class; 

Pa,vg{f) ” average number of trials required for the method to satisfy 
condition (|19l) for all 5(f) functions of the test class. 

Numerical results obtained by the proposed method on the considered 
two-dimensional test class when the reliability parameter varied from f = 1.2 
up to f = 5.8 are reported in Table [2| 

It can be seen from Table |2jthat with increasing the value of r the reliabil¬ 
ity (the number of solved problems 5(f)) of the method increases. However, 
the numbers of trials (both maximal and average) increase too. This is due 
to the fact (well known in Lipschitz global optimization; see, e. g., jl^) that 
every function optimized by the proposed method has its own value r* from 
Theorem 1. Therefore, when one executes tests with a class of 100 different 
functions it becomes difficult to use specific values of r for each function. 

In fact (see an illustration given in Figures |4| and [5|), for one particular 
function (see Figure H] left part) a relatively small value of r can be already 
sufficient to find its global minimizer in sense of condition (|19p . In this case, 
a higher value of the reliability parameter (see Figure IH right part) can cause 
the method to perform an additional examination of the search domain in the 
hope of capturing an eventual better minimum. Obviously, this exploration 
increases the number of trials performed by the method on this particular 
function (see Figure SJ right part) but allows the method to solve some 
other problem (as that in Figure |5|) for which a greater value of r is required. 
An insufficient (i.e., too small) value of r makes the method converge to a 
local minimizer, as it is shown in Figure [S] left part. It should be noticed 


15 







Figure 4: Trial points (black dots) generated by the SmoothD method 
when minimizing function number 58 from the GKLS test class 1, x* ~ 
(—0.2371,0.5791) {left: r = 1.2, 152 trials; right: r = 5.8, 452 trials). 



Figure 5: Trial points (black dots) generated by the SmoothD method 
when minimizing function number 54 from the GKLS test class 1 , x* ~ 
(0.6841,0.0664) {left: f = 1.2, 20 trials; right: f = 5.8, 375 trials). 
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Table 3: Analysis of the adaptive reliability parameter of the SmoothD 
method in the case of the two-dimensional GKLS test class 1 


Indicator 

f = 1.2 

f = 1.8 

f = 2.8 

f = 3.8 

r = 4.8 

f = 5.8 

p*{r) 

201 

277 

C = 10.0 

377 

410 

424 

453 

Pavg(r) 

113.16 

176.06 

250.45 

294.64 

324.10 

342.01 

5(f) 

62 

86 

96 

98 

99 

100 

p*{r) 

179 

283 

C = 20.0 
379 

411 

424 

453 

Pavg(r) 

122.01 

181.26 

252.86 

295.51 

324.73 

342.47 

5(f) 

69 

86 

96 

99 

99 

100 

p*{f) 

214 

293 

C = 50.0 
387 

414 

425 

454 

Pavg(r) 

146.42 

194.98 

257.56 

299.23 

327.24 

343.83 

5(f) 

83 

92 

100 

100 

100 

100 

p*{f) 

251 

314 

C = 100.0 

369 

416 

428 

456 

Pavg(r) 

170.40 

212.98 

247.72 

305.07 

330.47 

345.85 

5(f) 

92 

97 

100 

100 

100 

100 


in this connection that in some practical applications the discovering of a 
local minimizer can be also very useful. Figure [5l left part, shows that when 
local solutions are of an interest, the algorithm SmoothD can provide a fast 
convergence to local minimizers with small values of r. 

In order to avoid an excessive exploration of the search region, several ap¬ 
proaches can be used. For example, an adaptive adjustment of the reliability 
parameter can be performed using the following formula 

- C 

r = r-F-^, (20) 

where k is the iteration number of the method and C is a positive constant. 
In this way, at the initial iterations of the method (when k is small) a greater 
attention is paid to the exploration of the whole search domain D while for 
high values of k the influence of C diminishes and the reliability parameter 
becomes closer to its fixed value f. As it can be seen from results in Table [3] 
(where the same values f as in Table |2] were used but with different values 
C > 0 in (|20p l. this adaptive technique improves the method performance. 

As a practical recommendation for solving real-life global optimization 
problems, the following procedure can be adopted. An initial number of 
trials that are to be performed is fixed and the method is started with a 
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small value of r. Once the method stopped, it is restarted on the same pool 
of trial points (without re-evaluating f{x) and f'{x)) but with a greater value 
of r. If after repeating these restarts several times the method converges to 
the same point, this point is accepted as an approximation of the global 
minimizer. For more information on the reliability parameter and other 
parameters of this kind in Lipschitz global optimization see, e. g., |481151) . 

4.3 Operating characteristics of the compared methods 

The overall numerical results for three compared methods on the considered 
GKLS test classes are summarized in this subsection. Since the DIRECT- 
type methods under examination do not have any internal stopping criteria, 
they stopped either when condition (I19p was satisfied or when Pmax trials 
were performed. During their work, these methods evaluate only the ob¬ 
jective function f{x) from ([T|) (and do not evaluate f'{x) as the SmoothD 
method does), therefore the operation of executing a trial is less expensive in 
the DIRECT-type algorithms with respect to that of the new method. The 
balancing parameter e of the DIRECT and DIRECT/ methods (see cm 120]) 
was set equal to e = 10“^, as recommended by many authors (see, e.g., the 
review of different techniques for setting this parameter in [33]). 

Formula ()20l) was used in the SmoothD method to set the reliability 
parameter r from with the value ^ = 10 ® in m and the values C 
from (|20l) set in relation to the problems dimension (C equal to 50, 100, 150, 
and 200 in the case of N equal to 2, 3, 4, and 5, respectively). Moreover, 
several values of f can be fixed for the entire class starting from the initial 
value f = 1.1 (the maximal values of this coefficient were equal to 2.80, 
5.80, 3.60, 4.30, 5.80, 6.60, 4.10, and 7.80 for the CKLS classes from 1 to 8, 
respectively). 

The algorithms performance can be conveniently visualized by the op¬ 
erating characteristics (introduced in 1978 in |15] . see, e.g., |51] for their 
English-language description). The operating characteristics are formed by 
the pairs (p, S{p)) where p is the number of trials and S{p) (0 < 5(p) < S = 
100) is the number of test problems (among a set of S tests) solved by a 
method with less than or equal to p function trials. Each pair (for increasing 
values of p) corresponds to a point on the plane. Higher is the graph of the 
operating characteristics of a method with respect to another one, better is 
the method performance on the considered set of tests. 

In Figures [6| and 13 operating characteristics of the compared methods 
are shown, respectively, on simple and hard CKLS classes from Table [T] 
The numbers p* of trials required for the methods to satisfy condition (1191) 
for all 100 functions of a test class are also indicated in these Figures (after 
performing Pmax = 1 000 000 trials, the DIRECT method was unable to solve 
4, 4, 7, 1, and 16 problems of the CKLS classes 4-8, respectively, and the 
DIRECT/ method was unable to solve 4 problems of the CKLS class 8). 
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GKLS test class 1 (dimension N=2) GKLS test class 3 (dimension N='i) 



Trials, p ,„4 Trials, p 

‘ X 10 ' 


Figure 6: Operating characteristics for the compared methods SmoothD, 
DIRECT, and DIRECT/ on ‘simple’ GKLS test classes 1, 3, 5, and 7 from 
Table [T] 


It can be seen from the graphs reported in Figures [6] and [3 that on 
relatively simple functions of each GKLS class (approximatively, half of 100 
problems of the class quickly solved by every method) the SmoothD method 
behaves similarly to the DIRECT and DIRECT/ methods, in terms of the 
function trials performed (see, e. g., the intersection of the vertical line cor¬ 
responding to p ~ 200 trials and the operating characteristics graphs on the 
simple two-dimensional GKLS class 1 in Figure [6|. However, when more and 
more difficult problems of the particular GKLS class are to be solved, the ad¬ 
vantage of the new method becomes more and more pronounced with respect 
to the competitors (see the ascent of the operating characteristics graphs of 
the SmoothD method on all the GKLS classes in Figures |6] and [7|. The 
most significant advantage with respect to both the DIRECT and DIRECT/ 
methods is obtained by the new method on the hard five-dimensional GKLS 
class 8 (which is the most difficult among the classes from Table [1]) . 

5 A brief conclusion 

A new method SmoothD has been proposed to solve global optimization 
problems with the objective function having the Lipschitz gradient. It gener- 
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GKLS test class 2 (dimension N=2) GKLS test class 4 (dimension N-3) 



GKLS test class 6 (dimension N=4) GKLS test class 8 (dimension N=5) 



Figure 7: Operating characteristics for the compared methods SmoothD, 
DIRECT, and DIRECT/ on ‘hard’ GKLS test classes 2, 4, 6, and 8 from 
Table [T] 


alizes the one-dimensional geometric algorithm using smooth auxiliary func¬ 
tions and adaptive estimate of the Lipschitz constant from |39j to the mul¬ 
tidimensional case by using the efficient diagonal scheme from mmS]. The 
proposed method belongs to the class of ‘Divide-the-Best’ algorithms for 
which strong convergence properties have been established in @01. Results 
of numerical experiments performed with the SmoothD method on several 
hundreds of GKLS test functions from [12] show that already in its basic 
version the proposed algorithm manifests a very nice performance and opens 
new directions for developing powerful global optimization tools. In the fu¬ 
ture, it could be interesting to consider, in the connection to the SmoothD 
method, such approaches as the local tuning |39[ 145] and local improvement 
techniques |281 l29l 148] . Another interesting direction of research is to de¬ 
velop parallel versions of the proposed method within a general framework 
introduced in im mi [H l5T| for constructing parallel global optimization 
techniques. 
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